Software USA 4 #4

home *** CD-ROM | disk | FTP | other *** search

/ Software USA 4 #4 / Software USA Volume 4.04.iso / mac / Home and Office / Add⁄Strip / Example Files ƒ / About the Demos…

Wrap

Text File | 1997-01-06 | 10.0 KB | 94 lines | [ttro/ttxt]

The easiest way to learn how to use Add/Strip (A/S) is to experiment. There are usually several settings combinations which will create the same, or nearly the same, document. You’ll need to find settings that work best with your documents. To illustrate only a few of the types of things A/S can do, I have included some demos. Each folder contains one or more TEXT files and settings files. The settings contained within the settings files are used to control how A/S processes the TEXT files. See the comments in each settings file for more information. BOL Stripping ƒ ---------------------------------------------- This demo uses some text that was sent to me by an A/S user who saved it from the CNET Web site. He wanted to remove the text that runs down the left of the document and convert the review text that runs down the right side of the document into paragraphs. As it turns out, this isn’t too hard for this document since there are two returns after the line which is the end of each paragraph and the text we want to remove and save is formatted consistently throughout the document. That is, the text we want to save (positions 17+) never starts in the character range used by the text we want to remove (positions 1-16). I’ve included two settings files that will remove the unwanted text at the start of each line and make paragraphs of the remaining text. Both use one or more Line Replacement sets to remove the first sixteen characters in each line. Lines that are less than sixteen characters in length are effectively emptied. The settings file “Better Tidy” is preferable to the “Tidy up review” settings since it only needs one Line Replacements set to remove the leading characters, making use of the fact that A/S will match the entire line when the find string consists of only Any Char wildcards and the line has as many or fewer characters as there are wildcards in the find string. Database Demos ƒ ---------------------------------------------- Manipulating TEXT files that come from or are destined for a database application is a popular way to use A/S. To that end, A/S can convert to and from the tab delimited format in several ways. A/S can also deal with columnar text, which consists of text padded with spaces to create a columnar appearance when a monospaced font like Monaco is used. In the example file I include here, I have the output of a database application. It is arranged so that each line contains a 5-digit user code, a space, and then the user name. In the first settings file, Process Users.dat, I will convert the file so that a comma separates the user code and user name, and quotes surround both elements. I also will append a date (with quotes around it) to each line. I chose to use Line Replacements sets replace the sixth character of each line with a quote-comma-quote sequence. I use Any Char (“^?”) wildcards for matching and replacing since I want to specify the sixth character for replacement. I chose to use Line Replacements sets to append the date sequence. The best way to insert characters at the end of lines is to use Line Replacement sets. Since I still needed to insert a quote at the start of each line, I changed the Leading Chars string to a single quote and set the Leading Chars operation to Insert. The best way to insert characters at the start of lines is to use Line Replacement strings. Another way to insert characters at the start of lines is to use the Leading Chars string but fewer characters are allowed in this string than in a Line Replacement string. Another method which should produce the same results would be to use replacement and/or line replacements sets to insert tabs into the lkines at the appropriate places and then use Table Delimited -> CSV as the main process to convert the lines so that the items have commas between them and quotes enclosing them. In the second settings file, Process Users2.dat, I will insert a date between the user code and the user name and insert a single tab character between the user code and date and between the date and user name. There are several ways to do this, but I chose to use a Line Replacements set to insert the date string after the first five number characters followed by a space that exist at the start of a line. The find string of “^#^#^#^#^#^32” specifies any five numbers followed by a space. I could have simply typed a space, but I thought the the character code for a space, “^32”, would make it more obvious in the scrolling list that there was something after the Any Digit wildcards. To replace the matched numbers with the existing numbers, I used Any Char wildcards in the change string to specify five existing characters and indicated the date string I want to insert. The string “^?^?^?^?^?02/12/96” is what I used. This sequence essentially replaces the space between the user code and the user name with a date. Since I only need to add two tabs, I only needed to specify two character insertions of a single tab character (“^t”). HTML Convert ƒ ---------------------------------------------- There are two demos that show how A/S can manipulate HyperText Markup Language (HTML) formatted files. HTML describes a series of formatting codes that are added to a text document to specify formatting options like font size and font style. The coding can be quite complex. Internet World Wide Web pages are formatted using HTML. The first demo, Remove HTML Codes, takes advantage of the fact that HTML surrounds formatting codes with less than (“<”) and greater than (“>”) characters. Thanks to the Any Chars wildcard (“^@”), I can use a search string of “<^@>” to match any characters between the less than and greater than characters and then remove them. I do this searching using Replacements because these formatting codes can span lines. I also replace HTML character codes with the corresponding Macintosh characters. Additionally, I limit the number of lines and strip leading white space and normalize spacing. I simply Read & Write as the main process, but this might be able to be changed to one of the Make Paragraphs for simple, text-rich HTML documents. The second demo, HTML -> RTF Codes, attempts to preserve some of the HTML formatting information by converting the HTML text file to a Rich Text Format (RTF) file. In many ways the RTF format is similar to HTML documents, but there are some tricks that need to be used to get A/S to create RTF files in the way I want. RTF files contain a header that describes the fonts, style sheets, colors, and various other settings. The header I want to use is less than 2048 characters so I can add it directly by using the BOF insert string. Since this header will be inserted directly into the text that is read from the input file, it will be acted on by all of the follwoing A/S processes including replacement sets. Since the header must contain left and right brace characters and I want to use a replacement set to replace existing brace characters with the RTF code for these characters, I use a unique character sequence for the braces in the header. This sequence will be changed back to proper brace characters after the existing brace characters are encoded. I use a similar scheme to leave placeholders for the font style definitions in the header that will be replaced by a replacement set. Since the HTML -> RTF Codes settings makes extensive use of the Any Chars wildcard, it is advisable to increase A/S’s preferred memory size so that the input file will be read in larger chunks and will be less likely to miss matches when they may span chunks. This demo handles many HTML codes and does a reasonable job of converting them to RTF. However, it should be noted that A/S is simply searching and replacing text. There is no interpretation of the codes; no stacking and unstacking of formatting states. If the HTML file contains mistakes, certain required formatting commands are missing, or codes are used that are not included in specific replacement sets, the conversion may not work as expected. I’ve actually included two setting file that attempt to convert HTML documents to RTF documents. They are identical except that one converts <BR> codes to carriage returns and the other converts <BR> codes to end of line marks which are generated in Word by pressing shift-return. Forms and tables are not currently handled by this demo. Make Paragraphs ƒ ---------------------------------------------- Here is a very basic operation; selectively removing carriage returns (CRs) to make paragraphs. In other words, CRs are only used at the end of a paragraph, not each line, allowing the text to wrap at the paragraph margins. The only tricky part in this is determining which CRs should be removed. A/S tries to use blank lines and indentation to determine where paragraphs are supposed to be located (Make Paragraphs process). When indentation is not a reliable way of determining the start of a new paragraph, A/S can be told to ignore indentation and use only blank lines (Make Paragraphs2 process). Of course, if there is more indentation where paragraphs start, it may be possible to remove all but these extra spaces and then process letting A/S use the remaining indentation to determine paragraphs. The Process Shareware2.txt settings file illustrates this technique on the Shareware2.txt file. There are three settings files in this folder. One of them, Process Register.txt/PCNET.TXT, is meant to process both Register.txt and PCNET.TXT files. PCNET.TXT contains line feeds, but A/S automatically removes them when making paragraphs. Sequential Replacements ƒ ---------------------------------------------- Here I use three replacement sets to convert a Word 5 document which contains the raw output of Word’s indexing function. I wanted to have a single tab between the index item text and the item’s page numbers instead of the space that Word used. Using a three replacement sets and wildcards I replace the target space with a tab. This illustrates a method of temporarily marking a character sequence to prevent it from being changed, performing replacements, and then restoring the tagged sequence.